-
Notifications
You must be signed in to change notification settings - Fork 1.6k
repr(ordered_fields) #3845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
repr(ordered_fields) #3845
Conversation
Not to add too many extra colours to the list, but (Note: those three things should cover every case I've seen that uses Whereas Also, while it may be more technical than most users need to understand, it would be helpful if the RFC reiterated the current issues with |
Just as a small point of style the Guide Level Explanation is usually "what would be written in the rust tutorial book", and the Reference Level Explanation is "what could be written into the Rust Reference". This isn't a strict requirement, but personally I'd like to see the Reference Level part written out. Using the present tense, as if the RFC was accepted and implemented. |
add more unresolved questions
Rework struct layout description.
text/3845-repr-ordered-fields.md
Outdated
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
`repr(ordered_fields)` is a new representation that can be applied to `struct`, `enum`, and `union` to give them a consistent, cross-platform, and predictable in memory layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`repr(ordered_fields)` is a new representation that can be applied to `struct`, `enum`, and `union` to give them a consistent, cross-platform, and predictable in memory layout. | |
`repr(ordered_fields)` is a new representation that can be applied to `struct`, `enum`, and `union` to give them a consistent, cross-platform, and predictable in-memory layout. |
"cross-platform" -- the layout will differ when there are different layouts for struct members' types, in particular primitive types can have different alignments which changes the amount of padding.
e.g., #[repr(ordered_fields)] struct S(u8, f64);
doesn't have the same layout on x86_64 and i686
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this will need to be documented as a hazard in the ordered_fields
docs. However, the repr
itself will be cross-platform. For example, #[repr(ordered_fields)] struct Cross([u8; 3], SomeEnum);
will be truly cross-platform (given that SomeEnum
is!).
Co-authored-by: Jacob Lifshay <[email protected]>
Just voicing support for |
Nominating this so that we can do a preliminary vibe-check on it in a lang triage meeting. |
Currently `repr(C)` serves two roles | ||
1. Provide a consistent, cross-platform, predictable layout for a given type | ||
2. Match the target C compiler's struct/union layout algorithm and ABI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big fan of doing this split, especially for struct
s. (It's less obvious what choices to make for other things, IMHO, but at least for structs this is something I've wanted for ages, so that for example Layout::extend
can talk about it instead of C
.)
Pondering the bikeshed: declaration_order
or something could also be used to directly say what you're getting.
(This could be contrasted with other potential reprs that I wouldn't expect this RFC to add, but could consider as future work, like a deterministic_by_size_and_alignment
where some restricted set of optimizations are allowed but you can be sure that usize
and NonNull<String>
can be mixed between different types while still getting the "same" field offsets, for example.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also useful for unions, so we don't need to rely on repr(C)
to ensure that all fields of a union
are at offset 0.
This could be contrasted with other potential reprs that I wouldn't expect this RFC to add...
This also works as an argument against names like repr(consistent)
, since there are multiple consistent and useful repr
, making it not descriptive enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think that declaration_order
or ordered_fields
is a bit weird on a union
, because of course they're not really in any "order".
It makes me ponder whether we should just have repr(offset_zero)
for union
s to be explicit about it, or something.
(Which makes me think of other things like addressing rust-lang/unsafe-code-guidelines#494 by having a different constructs for "bag of maybeuninit stuff that overlap" vs "distinct options with an active-variant rule for enum-but-without-stored-discriminant". But those are definitely not this RFC.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind spelling it as repr(offset_zero)
for unions if that helps get this RFC accepted 😄. However, I have a sneaking suspicion that this isn't the contentious part of this RFC.
I know the name isn't optimal (intentionally). This can be hashed out after the RFC is accepted (or even give a different name for all of struct
, union
, and enum
).
The most important bit for me is just that we do the split (for all of struct
, union
, and enum
, to be consistent).
I've updated how enums's tags are specified, now they just defer to whatever |
text/3845-repr-ordered-fields.md
Outdated
|
||
`repr(C)` in edition <= 2024 is an alias for `repr(ordered_fields)` and in all other editions, it matches the default C compiler for the given target for structs, unions, and field-less enums. Enums with fields will be laid out as if they are a union of structs with the corresponding fields. | ||
|
||
Using `repr(C)` in editions <= 2024 triggers a lint to use `repr(ordered_fields)` as a future compatibility lint with a machine-applicable fix. If you are using `repr(C)` for FFI, then you may silence this lint. If you are using `repr(C)` for anything else, please switch over to `repr(ordered_fields)` so updating to future editions doesn't change the meaning of your code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too noisy. Most code out there using repr(C) is probably fine - IIUC, if you're not targeting Windows or AIX, maybe definitely fine? - and having a bunch of allow(...) across a bunch of projects seems unfortunate.
Maybe we can either (a) only enable the lint for migration, i.e., the next edition's cargo fix would add allows for you or (b) we find some new name... C2 for the existing repr(C) usage to avoid allows. But (b) also seems too noisy to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could just be an optional edition compatibility lint, so if someone enables e.g. rust_20xx_compatibility
it shows up but otherwise not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a) only enable the lint for migration
That was the intention, hence the name edition_2024_repr_c
. I'll make this more clear, that this is intended to be a migration lint.
Rustfix would update to #[repr(ordered_fields)]
to preserve the current behavior. For the FFI crates, #![allow(edition_2024_repr_c)]
at the top of lib.rs
would suffice. If you have a mix of FFI and non-FFI uses of repr(C)
, then you'll have to do the work to figure out which is which, no matter what option is chosen to update repr(C)
- even adding repr(C2)
, since then the FFI use case would need to update all their reprs to repr(C2)
.
Overall, I think this scheme only significantly burdens those who have a mix of FFI and non-FFI uses of repr(C)
. But they were going to be burdened no matter what option was chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the new wording/lints too noisy still?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new wording is still too noisy. We shouldn't assume that most people using repr(C)
are using it for ordering rather than FFI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wasn't my intention, but I don't see another way to do all of the following in the next edition:
- make
repr(C)
mean - same layout/ABI as what the standard C compiler does - make
repr(ordered_fields)
- the same algorithm that's listed forrepr(C)
in the Rust reference - ensure that everyone who upgrades to the next edition gets the layout they need (as long as they read the warnings and follow the given advice)
- make it as painless as possible for people who don't mix FFI and stable ordering cases (which I suspect is the vast majority of people). In other words, each crate currently uses
repr(C)
either exclusively for FFI or exclusively for some stable layout. - for people who do mix FFI and stable ordering cases in one crate, at least the warning should give them all the places they need to double-check, and they can silence the warning on a case-by-case basis.
I'm open to suggestions on how to handle the diagnostics. Within these constraints, I think my solution is the only real option we have. If there are some objections to these constraints, I would like to hear those too, maybe I missed the mark with these constraints, and missed a potential solution because of it.
text/3845-repr-ordered-fields.md
Outdated
} | ||
``` | ||
|
||
Enums with fields will be laid out as if they were a union of structs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement matches the algorithm currently used for "primitive representations" (repr(u8)
etc), not the algorithm currently used for repr(C)
or repr(C, uN)
. If this change is made, then anyone in need of specifically the current repr(C)
enum representation cannot migrate to repr(ordered_fields)
.
The difference between the two algorithms (besides the integer type used for the tag) is that in repr(C)
, all variants have their first field at the same offset, whereas in repr(uN)
, each variant’s first field has only the offset required by its type’s alignment, so individual variants may be at a lower offset. The repr(C)
enum acts as (and is documented as) a “tagged union” that contains an “untagged union”, whereas repr(uN)
does not.
Example of the difference:
#![feature(offset_of_enum)]
#[repr(align(8))]
struct A64(u64);
#[repr(C, i32)]
enum ExampleC {
Foo(u8),
Bar(A64),
}
#[repr(i32)]
enum ExamplePrim {
Foo(u8),
Bar(A64),
}
fn main() {
dbg!(
std::mem::offset_of!(ExampleC, Foo.0), // prints 8
std::mem::offset_of!(ExamplePrim, Foo.0), // prints 4
);
}
I think that either repr(ordered_fields, uN)
should instead exactly mimic repr(C, uN)
in this regard, or — perhaps a better idea — there should be a separate explicitly named representation for this, since it is rare to actually need this layout, and so users merely seeking predictable layout are better served by repr(uN)
, but they might unthinkingly put repr(ordered_fields)
on their enums along with their structs..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I didn't realize there was that difference, thanks for bringing this up. I think repr(ordered_fields)
should match what repr(C)
does now, to ease migrations (even in combo with the primitive reprs). We could add lints to suggest changing to just a primitive repr if repr(ordered_fields)
is used on an enum.
I think making the migration as painless as possible will make it much easier to split up repr(C)
up. And we can also add guidance towards other reprs via lints.
I'm open to having the repr be named differently for the different kinds of type. (For example, repr(offset_zero)
for unions was considered in another thread)
text/3845-repr-ordered-fields.md
Outdated
|
||
Of course, making `SomeFFI` size 8 doesn't work for anyone using `repr(C)` for case 1. They want it to be size 0 (as it currently is). | ||
|
||
A tertiary motivation is to make progress on a workaround for the MSVC bug [rust-lang/rust/112480](https://github.com/rust-lang/rust/issues/112480). This proposal doesn't attempt a complete solution for the bug, but it will be a necessary component of any solution to the bug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the third motivation, then what is the secondary motivation?^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it wasn't super clear,
The motivations are
- Splitting up two contradictory use-cases for
repr(C)
- Correct handling of ZSTs on MSVC repr(C) on MSVC targets does not always match MSVC type layout when ZST are involved rust#81996
- The MSVC alignment bug MSVC on x86-32 Windows fails to align variables to their required alignment rust#112480
But also, I was trying to hint that this isn't the most important motivation. I'll reword this to be less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how 1 is a motivation in itself -- 2 and 3 are the motivation to do 1. If it wasn't for 2 and 3, why would we care about 1? I don't think the idealistic "these two cases should not be conflated" really stands as a motivation (for a breaking change!) if there's no downside to conflating them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've updated the motivation section to remove references to numbers, so the motivation section should flow better.
I don't think the idealistic "these two cases should not be conflated" really stands as a motivation (for a breaking change!) if there's no downside to conflating them.
Yes, I agree that this doesn't stand on it's own. However, it doesn't need to, since it is one (small) motivation among others that make the case for this change. A small motivation to help add context to other more important motivations.
In this case, there is a downside to conflating them (as seen in the other linked issues). I guess I could argue that this is really the only motivation, with the others as evidence that there are downsides to this conflation. But that feels too much like arguing about semantics, which I would like to avoid.
Co-authored-by: +merlan #flirora <[email protected]>
* update repr(ordered_fields) algorithm for `enum` * add to and update motivation section * add potential drawbacks to some lints
Co-authored-by: Jacob Lifshay <[email protected]>
The RFC does a good job at establishing that there are two potentially distinct use for today's |
## `repr(ordered_fields)` | ||
|
||
> The `ordered_fields` representation is designed for one purpose: create types that you can soundly perform operations on that rely on data layout such as reinterpreting values as a different type | ||
> | ||
> This representation can be applied to structs, unions, and enums. | ||
> | ||
> - edited version of the [reference](https://doc.rust-lang.org/stable/reference/type-layout.html#the-c-representation) on `repr(C)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think would be nice to guarantee that if a repr(ordered_fields)
ADT is layout-compatible (same size, alignment, and field offsets, discounting repr(Rust)
ZST fields) with a struct
or union
defined in C, then it is also ABI-compatible with that C struct
/union
, as far as possible. This would allow using repr(ordered_fields)
to interoperate with C code that defines types with weird pragma
s, like AIX #pragma align (natural)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible, and I've listed something similar as an unresolved question (should repr(ordered_fields)
have a well-defined ABI?).
However, I think it would be best to punt this to a future RFC/ACP/discussion/etc. It doesn't seem to be required to split repr(C)
. Matching a weird pragma on a Tier 3 platform is not enough motivation to add to this RFC. (As I would like to keep this as lean as possible, only keeping the necessary portions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we currently do have an internal linear repr which is used for Box
and that does not inhibit ABI optimizations, that's necessary for box to act as a pointer type.
text/3845-repr-ordered-fields.md
Outdated
|
||
Of course, making `SomeFFI` size 8 doesn't work for anyone using `repr(C)` for case 1. They want it to be size 0 (as it currently is). | ||
|
||
The next two cases will not be solved by this RFC, but this RFC will provide the necessary steps towards the respective fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be 3 cases that follow (nesting packed/align, the MSVC win32 problem with u64/i64, and AIX).
I'd recommend more explicitly formatting this with markdown headers or bold paragraph headers or so to make it clear what the separate examples are. I think you have 4 examples in total now?
text/3845-repr-ordered-fields.md
Outdated
This also plays a role in [#3718](https://github.com/rust-lang/rfcs/pull/3718), where `repr(C, packed(N))` wants allow fields which are `align(M)` (while making the `repr(C, ...)` struct less packed). This is a footgun for normal uses of `repr(packed)`, so it would be better to relegate this strictly to the FFI use-case. However, since `repr(C)` plays two roles, this is difficult. | ||
|
||
By splitting `repr(ordered_fields)` off of `repr(C)`, we can allow `repr(C, packed(N))` to contain over-aligned fields (while making the struct less packed), and (continuing to) disallow `repr(ordered_fields, packed(N))` from containing aligned fields. Thus keeping the Rust-only case free of warts, without compromising on FFI use-cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something you intend to do as part of this RFC, or is it a future possibility? If the former, you should specify the exact behavior in the RFC. If the latter, it should be mentioned in the future possibilities section.
Also, what is the motivation for “(continuing to) disallow repr(ordered_fields, packed(N))
from containing aligned fields”? Should we not simply have the packed
take precedence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think there's a pretty obvious semantics for repr(ordered_fields, packed(N))
with repr(aligned)
fields -- treat those fields like all other fields, i.e., cap their alignment to N. Not sure if that's worth punting to a future possibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something you intend to do as part of this RFC, or is it a future possibility? If the former, you should specify the exact behavior in the RFC. If the latter, it should be mentioned in the future possibilities.
This is a reference to another RFC, I'll make this more clear.
Also, what is the motivation for “(continuing to) disallow repr(ordered_fields, packed(N)) from containing aligned fields”? Should we not simply have the packed take precedence?
This is the status quo, so no motivation needed. You would need some motivation to switch to packed taking precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think there's a pretty obvious semantics for repr(ordered_fields, packed(N)) with repr(aligned) fields -- treat those fields like all other fields, i.e., cap their alignment to N. Not sure if that's worth punting to a future possibility.
I would rather not scope creep this RFC to one that's too large and controversial to accept. Given that there is already a whole other RFC for exactly this case, I don't think this is a minor addition.
So, I'm trying to keep this RFC as small as possible. So, since it is currently a hard error, I'm keeping it like that. But as a compromise, I will add it as a unresolved question, so we don't lose track of this. If it is not controversial and won't hamper accepting this RFC, I'm happy to add it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see what the lang team says; they often like fixing some answer for those questions to make it more concrete what the RFC could turn into (and this one seems rather uncontroversial to me).
Note that we anyway have to define what the layout is for a repr(ordered_fields, packed)
with a field of generic type, even if that generic type ends up instantiated with a repr(align)
type. There's no way we can skip that question (except by forbidding generic types in repr(ordered_fields, packed)
, or by adding entirely new type system concepts to prevent this instantiation from happening -- neither of which is or should be suggested by the RFC). The current hard error is kind of a red herring, it doesn't actually reliably rule out repr(align) nested inside repr(packed).
> | ||
> - edited version of the [reference](https://doc.rust-lang.org/stable/reference/type-layout.html#the-c-representation) on `repr(C)` | ||
The exact algorithm is deferred to whatever the default target C compiler does with default settings (or if applicable, the most commonly used settings). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should repr(C)
do when the corresponding C code is rejected by the default target C compiler?
This affects, for instance, fieldless structs, which are rejected by MSVC (but accepted by GCC). By extension it then also affects structs where all fields are PhantomData
, as those should arguably be removed when translating a Rust type to a C type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should specify that repr(Rust)
or repr(ordered_fields)
1-ZSTs (incl. tuples) are always removed, but otherwise if there is no direct C equivalent (e.g. repr(Rust)
or repr(ordered_fields)
fields, or fieldless struct if not supported by the C compiler), the layout should be consistent within the same compiler version but otherwise unspecified.
Co-authored-by: Ralf Jung <[email protected]>
Add
repr(ordered_fields)
and provide a migration path to switch users fromrepr(C)
torepr(ordered_fields)
, then change the meaning ofrepr(C)
in the next edition.This RFC is meant to be an MVP, and any extensions (for example, adding more
repr
s) are not in scope. This is done to make it as easy as possible to accept this RFC and make progress on the issue ofrepr(C)
serving two opposing roles.Rendered
To avoid endless bikeshedding, I'll make a poll if this RFC is accepted with all the potential names for the new
repr
. If you have a new name, I'll add it to the list of names in the unresolved questions section, and will include it in the poll.